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Abstract. We develop abstract learning frameworks for synthesis that 
embody the principles of the CEGIS (counterexample-guided inductive 
synthesis) algorithms in current literature. Our framework is based on 
iterative learning from a hypothesis space that captures synthesized 
objects, using counterexamples from an abstract sample space, and a 
concept space that abstractly defines the semantics of synthesis. We 
show that a variety of synthesis algorithms in current literature can be 
embedded in this general framework. We also exhibit three general recipes 
for convergent synthesis: the first two recipes based on finite spaces and 
Occam learners generalize all techniques of convergence used in existing 
engines, while the third, involving well-founded quasi-orderings, is new, 
and we instantiate it to concrete synthesis problems. 


1 Introduction 

The field of synthesis, which includes several forms of synthesis including synthe¬ 
sizing controllers [44], program expressions [51], program repairs [32], program 
translations [15,29], loop invariants [22,21], and even entire programs [37,31], has 
become a fundamental and vibrant subfield in programming languages. While 
classical studies of synthesis have focused on synthesizing entire programs or 
controllers from specifications [37,44], there is a surge of tractable methods that 
have emerged in recent years in synthesizing small program expressions. These 
expressions often are complex but small, and are applicable in niche domains 
such as program sketching [51] (finding program expressions that complete code), 
synthesizing Excel programs for string transformations [23], synthesizing super- 
optimized code [48], deobfuscating code [27], synthesizing invariants to help in 
verification [21,22], etc. 

One prominent technique that has emerged in recent years for expression 
synthesis is based on inductively learning expressions from samples. Assume the 
synthesis problem is to synthesize an expression e that satisfies some specification 
ij)(e). The crux of this approach is to ignore the precise specification if, and 
instead synthesize an expression based on certain facets of the specification. 
These incomplete facets of the specification are often much simpler in structure 
and in logical complexity compared to the specification, and hence synthesizing 
an expression satisfying the constraints the facets impose is more tractable. The 
learning-based approach to synthesis hence happens in rounds— in each round, 



the learner synthesizes an expression that satisfies the current facets, and a 
verification oracle checks whether the expression satisfies the actual specification 
ip, and if not, finds a new facet of the specification witnessing this. The learner 
then continues to synthesize by adding this new facet to its collection. 

This counter-example guided inductive synthesis (CEGIS) approach [50] to 
synthesis in current literature philosophically advocates precisely this kind of 
inductive synthesis. The CEGIS approach has emerged as a powerful technique in 
several domains of both program synthesis as well as program verification ranging 
from synthesizing program invariants for verification [21,22] to specification min¬ 
ing [3], program expressions that complete sketches [51], superoptimization [48], 
control [28], string transformers for spreadsheets [23], protocols [54], etc. 

The goal of this paper is to develop a theory of iterative learning-based, 
synthesis through a formalism we call abstract learning frameworks for synthesis. 
The framework we develop aims to be general and abstract, encompassing several 
known CEGIS frameworks as well as several other synthesis algorithms not 
generally viewed as CEGIS. The goal of this line of work is to build a framework, 
with accompanying concepts, definitions, and vocabulary that can be used to 
understand and combine learning-based synthesis across different domains. 

An abstract learning framework (ALF) (see Figure 1 on Page 5) consists 
of three spaces: hi, S, and C. The (semantic) concept space C gives semantic 
descriptions of the concepts that we wish to synthesize, the hypotheses space 
T~L comprises restricted (typically syntactically restricted) forms of the concepts 
to synthesize, and the sample space S consists of samples (modeling facets of 
the specification) from which the learner synthesizes hypotheses. The spaces H 
and S are related by a variety of functions that give semantics to samples and 
semantics to hypotheses using the space C. The conditions imposed on these 
relations capture the learning problem precisely, and their abstract formulation 
facilitates modeling a variety of synthesis frameworks in the literature. 

The target for synthesis is specified as a set of semantic concepts. This is an 
important digression from classical learning frameworks, where often one can 
assume that there is a particular target concept that the learner is trying to 
learn. Note that in synthesis problems, we must implement the teacher as well, 
and hence the modeling of the target space is important. In synthesis problems, 
the teacher does not have a single target in mind nor does she know explicitly 
the target set (if she knew, there would be no reason to synthesize!). Rather, she 
knows the properties that capture the set of target concepts. For instance, in 
invariant synthesis, the teacher knows the properties of a set being an invariant for 
a loop, and this defines implicitly a set of invariants as target. The teacher needs 
to examine a hypothesis and check whether it satisfies the properties defining the 
target set. Consequently, we can view the teacher as a verification oracle that 
checks whether a hypothesis belongs to the implicitly defined target set. 

We exhibit a variety of existing synthesis frameworks that can be naturally 
seen as instantiations of our abstract-learning framework, where the formulation 
shows the diversity in the instantiations of the spaces. These include (a) a variety 
of CEGIS-based synthesis techniques for synthesizing program expressions in 



sketches (completing program sketches [51], synthesizing loop-free programs [24], 
mining specifications [28], synthesizing synchronization code for concurrent pro¬ 
grams [13], etc.), (b) synthesis from input-output examples such as Flashfill [23], 
(c) the CEGIS framework applied to the concrete problem of solving synthesis 
problems expressed in the SMT-based SyGuS format [2,1], and three synthesis en¬ 
gines that use learning to synthesize solutions, (d) invariant synthesis frameworks, 
including Houdini [18] and the more recent ICE-learning model for synthesizing 
loop invariants [21], spanning a variety of domains from arithmetic [21,22] to 
quantified invariants over data structures [20], and (e) synthesizing fixed-points 
and abstract transformers in abstract interpretation settings [52]. 

Formalizing of synthesis algorithms as ALFs can help highlight the nuances 
of different learning-based synthesis algorithms, even for the same problem. One 
example comprises two inductive learning approaches for synthesizing program 
invariants— one based on the ICE learning model [21], and the second which is any 
synthesis engine for logically specified synthesis problems in the SyGuS format, 
which can express invariant synthesis. Though both can be seen as CEGIS-based 
synthesis algorithms, the sample space for them are very different, and hence the 
synthesis algorithms are also different— the significant performance differences 
between SyGuS-based solvers and ICE-based solvers(tlie latter performing better) 
in the recent SyGuS competition (invariant-synthesis track) suggest that this 
choice may be crucial [4]. Another example are two classes of CEGIS-based solvers 
for synthesizing linear integer arithmetic functions against SyGuS specifications— 
one based on a sample space that involves purely inputs to the function being 
synthesized [46,41], while the other is the more standard CEGIS algorithm based 
on valuations of quantified variables. 

We believe that just describing an approach as a learning-based synthesis 
algorithm or a CEGIS algorithm does not convey the nuances of the approach— 
it is important to precisely spell out the sample space and the semantics of this 
space with respect to the space of hypotheses being learned. The ALF framework 
gives the vocabulary in phrasing these nuances, allowing us to compare and 
contrast different approaches. 

Convergence: The second main contribution of this paper is to study conver¬ 
gence issues in the general abstract learning-based framework for synthesis. We 
first show that under the reasonable assumptions that the learner is consistent 
(always proposes a hypothesis consistent with the samples it has received) and 
the teacher is honest (gives a sample that distinguishes the current hypothesis 
from the target set without ruling out any of the target concepts), the iterative 
learning will always converge in the limit (though, not necessarily in finite time, 
of course). This theorem vouches for the correctness of our abstract formalism in 
capturing abstract learning, and utilizes all the properties that define ALFs. 

We then turn to studying strategies for convergence in finite time. We propose 
three general techniques for ensuring successful termination for the learner. First, 
when the hypothesis space is bounded, it is easy to show that any consistent learner 
(paired with an honest teacher) will converge in finite time. Several examples of 
these exist in learning— learning conjunctions as in the Houdini algorithm [18], 



etc., learning Boolean functions (like decision-tree learning with purely Boolean 
predicates as attributes) or functions over bit-vector domains (Sketch [51] and the 
SyGuS solvers that work on bit-vectors), and learning invariants using specialized 
forms of a finite class of automata that capture list/array invariants [ 20 ]. 

The second recipe is a formulation of the Occam’s razor principle that uses 
parsimony/simplicity as the learning bias [7]. The idea of using Occam’s principle 
in learning is prevalent (see Chapter 2 of [30] and [38]) though its universal 
appeal in generalizing concepts is debatable [17]. We show, however, that learning 
using Occam’s principle helps in convergence. A learner is said to be an Occam 
learner if there is a complexity ordering , which needs to be a total quasi order 
where the set of elements below any element is finite, such that the learner always 
learns a smallest concept according to this order that is consistent with the 
sample. We can then show that any Occam learner will converge to some target 
concept, if one exists, in finite time. This result generalizes many convergent 
learning mechanisms that we know of in the literature (for example, the convergent 
ICE-learning algorithms for synthesizing invariants using constraint solvers [21], 
and the enumerative solvers in almost every domain of synthesis [34,32,42,54], 
including for SyGuS [2,1], that enumerate by dovetailing through expressions). 

The first two recipes for finite convergence cover all the methods we know 
in the literature for convergent learning-based synthesis, to the best of our 
knowledge. The third recipe for finite convergence is a more complex one based 
on well-founded quasi orderings. This recipe is involved and calls for using clever 
initial queries that force the teacher to divulge information that then makes 
the learning space tractable. We do not know of any existing synthesis learning 
frameworks that use this natural recipe, but propose two new convergent learning 
algorithms following this recipe, one for intervals, and the other for conjunctive 
linear inequality constraints over a set of numerical attributes over integers. 

2 Abstract Learning Frameworks for Synthesis 

In this section we introduce our abstract learning framework for synthesis. Figure 1 
gives an overview of the components and their relations that are introduced in 
the following (ignore the target T , 7 ^ : (T), and the maps r and A for now). We 
explain these components in more detail after the formal definition. 

Definition 1 (Abstract Learning Frameworks). An abstract learning frame¬ 
work for synthesis (ALF, for short), is a tuple A = (C, 'LL, (S, C s , U, _l_ s ), 7 , n), 
with 

— A class C, called the concept space, 

— A class 7 ~L, called the hypothesis space, 

— A class S , called the sample space, with a join semi-lattice (S , C s , U, _L S ) 
defined over it, 

— A concretization function 7 : T-L —> C, and 

— A consistency function: k : S —> 2 C satisfying k(_L s ) = C and k(S\ U S 2 ) = 
k(S 1 ) D k(S- 2 ) for all Si, S 2 £ S. If the second condition is relaxed to k(S 1 U 
S 2 ) C k(S 1 ) fl k(S , 2 )> we speak of a general ALF. 



We say an ALF has a complete sample space if the sample space (S , C s , U, _L S ) 
is a complete join semi-lattice (i.e., if the join is defined for arbitrary subsets of 

S). In this case, the consistency relation has to satisfy ac(| _|(tS' r )) = Pises' K (S) 

for each S 1 C S (and ^ Pises' K {S) for general ALFs). 

As in computational learning 
theory, as presented in e.g., in [ 6 ] 
or [30], we consider a concept space 
C, which contains the objects that 
we are interested in. For example, 
in an invariant synthesis setting 
in verification, an element C £ C 
would be a set of program con¬ 
figurations. In the synthesis set¬ 
ting, C could contain the objects 
we would like to synthesize, such 
as all functions from Z" to Z. 

The hypothesis space H contains the objects that the learner produces. These 
are representations of (some) elements from the concept space. For example, if C 
consists of all functions from Z" to Z, then H could consist the set of all functions 
expressible in linear arithmetic. 

The relation between hypotheses and concepts is given by a concretization 
function 7 : H —> C that maps hypotheses to concepts (their semantics). 

In classical computational learning theory for classification [30,38], one often 
considers samples consisting of positive and negative examples. If learning is used 
to infer a target concept that is not uniquely defined but rather should satisfy 
certain properties, then samples consisting of positive and negative examples are 
sometimes not sufficient. As we will show later, samples can be quite complex (see 
Section 4 for such examples, including implication counterexamples and grounded 
formulas). 

We work with a sample space, which is a bounded join-semilattice ( S , C B , U, _L S ) 
(i.e., C s is a partial order over S with _L S as the least element, and U is the binary 
least upper-bound operator on S with respect to this ordering). An element 
S £ S, when given by the teacher, intuitively, gives some information about 
a target specification. The join is used by the learner to combine the samples 
returned as feedback by the teacher during iterative learning. The least element 
_L S corresponds to the empty sample. We encourage the reader to think of the 
join as the union of samples. 

The consistency relation k captures the semantics of samples with respect to 
the concept space by assigning to each sample S the set k(S) of concepts that 
are consistent with the sample. The first condition on k says that all concepts are 
consistent with the empty sample _L S . The second condition says that the set of 
samples consistent with the join of two samples is precisely the set of concepts that 
is consistent with both the samples. Intuitively, this means that joining samples 
does not introduce new inconsistencies, and existing inconsistencies transfer to 
bigger samples. The condition that k(S 1 U S 2 ) C k(S 1 ) n k(S 2 ) is natural, as 


sample space 
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Fig. 1. Components of an ALF 



it says that if a concept is consistent with the join of two samples, then the 
concept must be consistent with both of them individually. The condition that 
k(S i U .S' 2 ) D k(S 1 ) D k(S 2 ) is debatable; it claims that samples when taken 
together cannot eliminate a concept that they couldn’t eliminate individually. We 
therefore mention the notion of general ALF in Definition 1. However, we have 
not found any natural example that requires such a generalization, and therefore 
prefer to work with ALFs instead of general ALFs in the rest of the paper. In 
Definition 4, we comment on what needs to be adapted to make the results of 
the paper go through for general ALFs. 

The following simple but useful observation on monotonicity of the consistency 
relation (which easily follows from the property of the consistency relation, for 
ALFs and general ALFs) is used in some of the proofs. 

Remark 1. If S\ C s S 2 , then n(S 2 ) C k(S'i). 

Some other auxiliary definitions we will need: We define k-^(S) := {H £ 
'LL | 7 (H) £ k(S)} to be the set of hypotheses that are consistent with S. For 
a sample S £ S we say that S is realizable if there exists a hypothesis that is 
consistent with S (i.e., k-h(S) 7 ^ 0). 

ALF Instances and Learners An instance of a learning task for an ALF is given 
by a specification that defines target concepts. The goal is to infer a hypothesis 
whose semantics is such a target concept. In classical computational learning 
theory, this target is a unique concept. In applications for synthesis, however, 
there can be many possible target concepts, for example, all inductive invariants 
of a program loop. 

Formally, a target specification is just a set T C C of concepts. An ALF 
instance combines an ALF and a target specification: 

Definition 2 (ALF Instance). An ALF instance is a pair ( A,T ) where A = 
(C, 'LL, ( S, C s , U, _L S ), 7 , k) is an ALF and T C C is a target specification. 

The goal of learning-based synthesis is for the learner to synthesize some 
element H £ 'LL such that 7 (H) £ T. Furthermore, the role of the teacher is to 
instruct the learner giving reasons why the hypothesis produced by the learner 
in the current round does not belong to the target set. 

There is a subtle point here worth emphasizing. In synthesis frameworks, 
the teacher does not explicitly know the target space T. Rather she knows a 
definition of the target space, and she can examine a hypothesis H and check 
whether it satisfies the properties required of the target set. For instance, when 
synthesizing an invariant for a program, the teacher knows the properties of the 
invariant (inductiveness, etc.) and gives counterexample samples based on failed 
properties. 

We say that the target specification is realizable by a hypothesis , or simply 
realizable, if there is some H £LL with 7 (H) £ T. For a hypothesis H £ 'LL, we 
often write H € T instead of 'y(H) £ T. 



As in classical computational learning theory, we define a learner (see Figure 1) 
to be a function that maps samples to hypotheses, and a consistent learner to be 
a learner that only proposes consistent hypotheses for samples. 

Definition 3. A learner for an ALF A = (C, 'LL, (S, C s , LI, J_ s ), 7 , n) is a map 
A S —y 'LL that assigns a hypothesis to every sample. A consistent learner is a 
learner A with y(A (S)) £ k(S) for all realizable samples S £ S. 

Iterative learning. In the iterative learning setting, the learner produces a hypoth¬ 
esis starting from some initial sample (e.g., _L S ). For each hypothesis provided by 
the learner that does not satisfy the target specification, a teacher (see Figure 1) 
provides feedback by returning a sample witnessing that the hypothesis does not 
satisfy the target specification. 

Definition 4. Let(A,T) be an ALF instance with A = (C, 'LL, (S, C s , U, J_ s ), 7 , k), 
and T CC. A teacher for this ALF instance is a function r : 'LL —► S that satisfies 
the following two properties: 

i) Progress: t(H) = _L S for each target element H £ T, and 7 (H) ^ k(t(H)) 

for all H fiT, and 

ii) Honesty: T C k(t(H)) for each H £ 'LL. 4 

Firstly, progress says that if the hypothesis is in the target set, then the teacher 
must return the “empty” sample _L S , signaling that the learner has learned a 
target; otherwise, the teacher must return a sample that rules out the current 
hypothesis. This ensures that a consistent learner can never propose the same 
hypothesis again, and hence makes progress. Secondly, honesty demands that the 
sample returned by the teacher is consistent with all target concepts. This ensures 
that the teacher does not eliminate any element of the target set arbitrarily. 

When the learner and teacher interact iteratively, the learner produces a 
sequence of hypotheses, where in each round it proposes a hypothesis A (S) for 
the current sample S £ S, and then adds the feedback r(A(5)) of the teacher to 
obtain the new sample. 

Definition 5. Let(A,T) be an ALF instance with A = , C S ,U, J_ s ), 7 , k), 

and T C C. Let A : S —» 'LL be a learner, and let t : 'LL —?■ S be a teacher. The 
combined behavior of the learner A and teacher r is the function f Tt \ : S —> S, 
where f T ,\{S ) := S U r(A(S 1 )). 

The sequence of hypotheses generated by the learner A and teacher r is the 
transfinite sequence (5“ A | a £ O), where O denotes the class of all ordinals, 
obtained by iterative application of f Ti \: 

s% “-L.; 

— S '“ A 1 := /t,a(<S'“ a ) for successor ordinals; and 

4 For general ALFs one has to require that the least upper bound of all samples returned 
by the teacher is consistent with all targets (and for non-complete sample lattices 
the least upper bound of all possible finite sets of samples returned by the teacher). 



— S“ A := U/3<a a f or limit ordinals. 

If the sample lattice is not complete, the above definition is restricted to the 
first two items and yields a sequence indexed by natural numbers. 

The following lemma states that the teacher’s properties of progress and 
honesty transfer to the iterative setting for consistent learners if the target 
specification is realizable. 

Lemma 1 . Let T be realizable, X be a consistent learner, and t be a teacher. If 
S is a complete sample lattice, then 

(a) the learner makes proqress: for all a G O, either k(S!?\) 2 fifS'f't 1 ) and 
A(S“ A ) i k{S °£), or A(S“ A ) G T, and 

(b) the sample sequence is consistent with the target specification: T C k(S“ a ) 
for all a G O. 

If S is a non-complete sample lattice, then (a) and (b) hold for all a G N. 

Proof. The proof is a straight-forward transfmite induction, using the properties 
of the teacher and the consistency relation. For the case of non-complete sample 
lattice, ignore the limit step in the proof below. 

For part (b), the induction base is given by k(_L s ) = C. The induction step 
for limit ordinals directly follows from the property of the consistency relation: 
all previous samples are consistent with the target specification, so their join is, 
too . 5 For a successor ordinal a + 1 it follows from the fact that is a join of 

two samples that are both consistent with the target specification. 

For part (a), let a G O such that H := X(Sf x ) £ T. Then = Sf x U S 
with S = t(H). In particular, S“ A C s <S '“ A 1 and thus k( 5“ a ) 2 k(S“ f 1 ) by 
Remark 1. For the strictness of the inclusion, note that 7 (H) G k(S'“ a ) because 
A is a consistent learner (and 5“ A is realizable because it is consistent with the 
realizable target specification by (b)). Furthermore, 'y(H) ({ k(S) by the progress 
property of the teacher, and hence 7 (H) ^ k(S’“ a 1 ). □ 

We end with an example of an ALF. Consider the problem of synthesizing 
guarded affine functions that capture how a piece of code P behaves, as in program 
deobfuscation. Then the concept class could be all functions from IP to Z, the 
hypothesis space would be the set of all expressions describing a guarded affine 
function (in some fixed syntax). The target set (as a subset of C) would consist 
of a single function {ft}, where ft is the function computed by the program P. 
For any hypothesis function h , let us assume we can build a teacher who can 
compare h and P for equivalence, and, if they differ, return a counterexample of 
the form (i, o), which is a concrete input i on which h differs from P, and o is 
the output of P on i. Then the sample space would consist of sets of such pairs 
(with union for join and empty set for _L S ). The set of functions consistent with a 

5 For general ALFs one uses the property that the least upper bound of all samples 
returned by the teacher is consistent with all targets. 



set of samples would be the those that map the inputs mentioned in the samples 
to their appropriate outputs. The iterative learning will then model the process 
of synthesis, using learning, a guarded affine function that is equivalent to P. 


3 Convergence of iterative learning 

In this section, we study convergence of the iterative learning process. We start 
with a general theorem on transfinite convergence (convergence in the limit) for 
complete sample lattices. We then turn to convergence in finite time and exhibit 
three recipes that guarantee convergence. 

From Lemma 1 one can conclude that the transfinite sequence of hypotheses 
constructed by the learner converges to a target set. 

Theorem 1. Let S be a complete sample lattice, T be realizable, X be a consistent 
learner, and t be a teacher. Then there exists an ordinal a such that A (Sf x ) G T. 

Proof. Let a be an ordinal with cardinality bigger than \TL\ (bigger than |<S| also 
works). If A(Sy A ) ^ T for all j3 < a, then Lemma 1 (a) implies that all A(S^ a ) for 
(3 < a are pairwise different, which contradicts the cardinality assumption. □ 

The above theorem ratifies the choice of our definitions, and the proof (relying 
on Lemma 1) crucially uses all aspects of our definitions (the honesty and progress 
properties of the teacher, the condition imposed on n in an ALF, the notion of 
consistent learners, etc.). 

Convergence in finite time is clearly the more desirable notion, and we propose 
tactics for designing learners that converge in finite time. For an ALF instance 
(A,T), we say that a learner A converges for a teacher r if there is an n £ N 
such that A(S" a ) € T, which means that A produces a target hypothesis after 
n steps. We say that A converges if it converges for every teacher. We say that 
A converges from a sample S in case the learning process starts from a sample 
(i.e.,ifS° T = S). 


Finite hypothesis spaces We first note that if the hypothesis space (or the 
concept space) is finite, then any consistent learner converges: by Lemma 1, the 
learner always makes progress, and hence never proposes two hypotheses that 
correspond to the same concept. Consequently, the learner only produces a finite 
number of hypotheses before finding one that is in the target (or declare that no 
such hypothesis exists). 

There are several synthesis engines using learning that use finite hypothesis 
spaces. For example, Houdini [18] is a learner of conjunctions over a fixed finite 
set of predicates and, hence, has a finite hypothesis space. Learning decision trees 
over purely Boolean attributes (not numerical) [45] is also convergent because of 
finite hypothesis spaces, and this extends to the ICE learning model as well [22]. 
Invariant generation for arrays and lists using elastic QDAs [20] also uses a 
convergence argument that relies on a finite hypothesis space. 



Occam Learners We now discuss the most robust strategy we know for conver¬ 
gence, based on the Occam’s razor principle. Occam’s razor advocates parsimony 
or simplicity [7], that the simplest concept/theory that explains a set of observa¬ 
tions is better, as a virtue in itself. There are several learning algorithms that use 
parsimony as a learning bias in machine learning (e.g., pruning in decision-tree 
learning [38]), though the general applicability of Occam’s razor in machine 
learning as a sound means to generalize is debatable [17]. We now show that in 
iterative learning, following Occam’s principle leads to convergence in finite time. 
However, the role of simplicity itself is not the technical reason for convergence, 
but that there is some ordering of concepts that biases the learning. 

Enumerative learners are a good example of this. In enumerative learning, 
the learner enumerates hypotheses in some order, and always conjectures the 
first consistent hypothesis. In an iterative learning-based synthesis setting, such 
a learner always converges on some target concept, if one exists, in finite time. 

Requiring a total order of the hypotheses is in some situations too strict. 
If, for example, the hypothesis space consists of deterministic finite automata 
(DFAs), we could build a learner that always produces a DFA with the smallest 
possible number of states that is consistent with the given sample. However, the 
relation A that compares DFAs w.r.t. their number of states is not an ordering 
because there are different DFAs with the same number of states. 

In order to capture such situations, we work with a total quasi-order A on H 
instead of a total order. A quasi-order (also called preorder) is a transitive and 
reflexive relation. The relation being total means that H ■< H' or H' < H for all 
H, H' £ H. The difference to an order relation is that H < H' and H' < H can 
hold in a quasi-order, even if H ^ H'. 

In analogy to enumerations, we require that each hypothesis has only finitely 
many hypotheses “before” it w.r.t. A, as expressed in the following definition. 

Definition 6. A complexity ordering is a total quasi-order A such that for each 
x £ T~L the set {y € T-L \ y -A x} is finite. 

The example of comparing DFAs with respect to their number of states is 
such a complexity ordering. 

Definition 7. A consistent learner that always constructs a smallest hypothesis 
with respect to a complexity ordering A on T-L is called an ^-Occam learner. 

Example 1. Consider P = C to be the interval domain over the integers consisting 
of all intervals of the form [l, r], where l,r£Z U {—oo, oo} and l < r. We define 
[l,r] A [V,r'\ if either [l,r] = [— 00 , 00 ] or max{|x| | x € {l,r}C\Z} < max{|a;| | 
x € D Z}. For example, [—4, 00 ] A [1,7] because 4 < 7. This ordering A 

satisfies the property that for each interval [l,r] the set {[l',r') \ [l',r’] A [Z,r]} is 
finite (because there are only finitely many intervals using integer constants with 
a bounded absolute value). A standard positive/negative sample S = (P , N) with 
P, N C N is consistent with all intervals that contain the elements from P and do 
not contain an element from N. A learner that maps S to an interval that uses 
integers with the smallest possible absolute value (while being consistent with 



S ) is an ^-Occam learner. For example, such a learner would map the sample 
(P = {—2,5},TV = {—8}) to the interval [—2,oo]. j 

The next theorem shows that A-Occam learners ensure convergence in finite 
time. 

Theorem 2 . If T is realizable and A is a A -Occam learner, then A converges. 
Furthermore, the learner converges to a ^-minimal target element. 

Proof. Pick any target element TgH, which exists because T is realizable. Since 
r is honest, T £ k( 5'" a ) for all n by Lemma 1(b). Thus, on the iterated sample 
sequence, a A-Occam learner never constructs an element which is strictly above 
T w.r.t. A. Since there are only finitely many hypothesis that are not strictly 
above T, and since the learner always makes progress according to Lemma 1, it 
converges to a target element in finitely many steps, which itself does not have 
any other target elements below, and thus is A-minimal. □ 

There are several existing algorithms in the literature that use such orderings 
to ensure convergence. Several enumeration-based solvers are convergent because 
of the ordering of enumeration (e.g., the generic enumerative solver for SyGuS 
problems [2,1]). The invariant-generation ranging over conditional linear arith¬ 
metic expressions described in [21] ensures convergence using a total quasi-order 
based on the number of conditionals and the values of the coefficients. The learner 
uses templates to restrict the number of conditionals and a constraint-solver to 
find small coefficients for linear constraints. 


Convergence using Tractable Well Founded Quasi-Orders The third 
strategy for convergence in finite time that we propose is based on well founded 
quasi-orders, or simply well-quasi-orders. Interestingly, we know of no existing 
learning algorithms in the literature that uses this recipe for convergence (a 
technique of similar flavor is used in [11]). We exhibit in this section a learning 
algorithm for intervals and for conjunctions of inequalities of numerical attributes 
based on this recipe. A salient feature of this recipe is that the convergence 
actually uses the samples returned by the teacher in order to converge (the 
first two recipes articulated above, on the other hand, would even guarantee 
convergence if the teacher just replies yes/no when asked whether the hypothesis 
is in the target set). 

A binary relation A over some set A is a well-quasi-order if it is transitive 
and reflexive, and for each infinite sequence Xq, X\, X2, ■ ■ ■ there are indices i < j 
such that Xi A xj. In other words, there are no infinite descending chains and no 
infinite anti-chains for A. 

Definition 8 . Let(A,T) be an ALF instance with A = {C,T~L,{S U, _L S ), 7 , k). 
A subset of hypotheses W C P is called wqo-tractable if 

(a) there is a well-quasi-order Ayy on W, and 

(b) for each realizable sample S £ S with k-h(S) C W, there is some £^w-maximal 
hypothesis in W that is consistent with S. 



Example 2. Consider again the example of intervals over Z U {— 00 , 00 } with 
samples of the form S = ( P, N ) (see Example 1). Let p £ Z be a point and let I p 
be the set of all intervals that contain the point p. Then, I p is wqo-tractable with 
the standard inclusion relation for intervals, defined by [£, r] C [£', r'} iff £ > £! 
and r < r'. Restricted to intervals that contain the point p, this is the product of 
two well-founded orders on the sets {x £ Z | x < p} and {x € Z | x > p}, and 
as such is itself well-founded [25, Theorem 2.3]. Furthermore, for each realizable 
sample (P, N), there is a unique maximal interval over ZU{— 00 , 00 } that contains 
P and excludes N. Hence, the two conditions of wqo-tractability are satisfied. 
(Note that this ordering on the set of all intervals is not a well-quasi-order; the 
sequence [— 00 ,0], [— 00 , —1], [— 00 , —2],... witnesses this.) j 

On a wqo-tractable W C P a learner can ensure convergence by always 
proposing a maximal consistent hypothesis, as stated in the following lemma. 

Lemma 2. Let T be realizable, W C P be wqo-tractable with well-quasi-order 
;<yy, and S be a sample such that k-h(S) C W. Then, there exists a learner that 
converges from the sample S. 

Proof. For any sample S’ with S U s S', the set K'u(S’) of hypotheses consistent 
with S' is a subset of W. Therefore, there is some Ayy-maximal element in W 
that is consistent with S'. The strategy of the learner is to return such a maximal 
hypothesis. Assume, for the sake of contradiction, that such a learner does not 
converge from S for some teacher r. Let Po,Pi,... be the infinite sequence 
of hypothesis produced by A and r starting from S, and let So, Si, S 2 , ■ ■ ■ be 
the corresponding sequence of samples (with Sq = S). The well-foundedness of 
;<yy implies that there are i < j with P t ^yy Hj. However, S t C s Sj because 
Sj is obtained from Si by joining answers of the teacher. Therefore, Hj is also 
consistent with Si (Remark 1). This contradicts the choice of Hi as a maximal 
hypothesis that is consistent with S). □ 

As shown in Example 2, for each p £ Z, the set I p of intervals containing p is 
wqo-tractable. Using this, we can build a convergent learner starting from the 
empty sample _L S . First, the learner starts by proposing the empty interval, the 
teacher must either confirm that this is a target or return a positive example, that 
is, a point p that is contained in every target interval. Hence, the set of hypotheses 
consistent with this sample is wqo-tractable and the learner can converge from 
here on as stated in Lemma 2. In general, the strategy for the learner is to force 
in one step a sample S such that the set k-h(S) = T p is wqo-tractable. This is 
generalized in the following definition. 

Definition 9. We say that an ALF is wqo-tractable if there is a finite set 
{Hi ,..., H n } of hypotheses such that k-h(S) is wqo-tractable for all samples S 
that are inconsistent with all Hi, that is, k-h(S) D {Pi, ..., P„} = 0. 

As explained above, the interval ALF is wqo-tractable with the set {Pi,..., P„} 
consisting only of the empty interval. 

Combining all the previous observations, we obtain convergence for wqo- 
tractable ALFs. 



Theorem 3. For every ALF instance [A, T) such that A is wqo-tractable and 
T is realizable, there is a convergent learner. 

Proof. A convergent learner can be built as follows. Let {Hi,... ,H n } be the 
finite set of hypotheses from the definition of wqo-tractability of an ALF. 

— As long as the current sample is consistent with some Hi, propose such an 
Hi. 

— Otherwise, the current sample S is such that K'h(S) is wqo-tractable, and 

thus the learner can apply the strategy from Lemma 2. □ 

A convergent learner for conjunctive linear inequality constraints. We have 
illustrated wqo-tractability for intervals in Example 2. We finish this section by 
showing that this generalizes to higher dimensions, that is, to the domain of 
n-dimensional hyperrectangles in (Z U {—oo,oo})", which form the hypothesis 
space in this example. Each such hyperrectangle is a product of intervals over 
(ZU{—oo, oo}) n . Note that hyperrectangles can, e.g., be used to model conjunctive 
linear inequality constraints over a set f-[,..., f n : 7L d — > Z of numerical attributes. 

The sample space depends on the type of target specification that we are 
interested in. We consider here the typical sample space of positive and negative 
samples (however, the reasoning below also works for other sample spaces, e.g., 
ICE sample spaces that additionally include implications). So, samples are of the 
form S = (P, N), where P, N are sets of points in Z" interpreted as positive and 
negative examples (as for intervals, see Example 1). 

The following lemma provides the ingredients for building a convergent learner 
based on wqo-tractability. 

Lemma 3. (a) For each realizable sample S = ( P,N ), there are maximal hyper¬ 
rectangles that are consistent with S (possibly more than one). 

(b) For each p £ Z", the set 1Z P of hyperrectangles containing p is well-quasi- 
ordered by inclusion. 

Proof. For the first claim, note that for each increasing chain i?o C i?i C ■ ■ ■ 
of hyperrectangles that are all consistent with S, the union R := U;>o is 
also a hyperrectangle that is consistent with S. More precisely, if Ri = [Zj,rj] x 
••• x [l l n ,r l r J, then R = [Zi,n] x ••• x [l n ,r n \ with lj = inf {It \ i > 0} and 
Vj = sup{r*- | i > 0}. 

Furthermore, if the chain is strictly increasing, then there exists j such that 
lj = —oo and all Vj ^ —oo, or r 7 = oo and all r' ^ oo. Hence, if R itself can 
be extended again into an infinite strictly increasing chain, then the union of 
this chain will contain an additional oo of —oo. This can happen at most 2 n 
times before reaching the hyperrectangle containing all points, which is certainly 
maximal. Thus, there has to be a maximal hyperrectangle consistent with S. 

We now prove the second claim. For a point p = (pi,... ,p n ), the set 1Z P is 
the product 1Z P = I Pl x • • • x I Pn of the sets of intervals containing the points pi. 
Furthermore, the inclusion order for hyperrectangles is the ?i-fold product of the 
inclusion order for intervals. Thus, the inclusion order on 1Z P is a well-quasi-order 
because it is a product of well-quasi-orders [25]. □ 



We conclude that the following type of learner is convergent: for the empty 
sample, propose the empty hyperrectangle; for every non-empty sample S, propose 
a maximal hyperrectangle consistent with S. 


4 Synthesis Problems Modeled as ALFs 

In this section, we list a host of existing synthesis problems and algorithms that 
can be seen as ALFs. Specifically, we consider examples from the areas of program 
verification and program synthesis. We encourage the reader to look up the 
referenced algorithms to better understand their mapping into our framework. 
Moreover, we have new techniques based on ALFs to compute fixed-points in the 
setting of abstract interpretation using learning. 


4.1 Program Verification 

While program verification itself does not directly relate to synthesis, most 
program verification techniques require some form of help from the programmer 
before the analysis can be automated. Consequently, synthesizing objects that 
replace manual help has been an area of active research. We here focus on learning 
loop invariants. Given adequate invariants (in terms of pre/post-conditions, loop 
invariants, etc.), the rest of the verification process can often be completely 
automated [19,26] using logical constraint-solvers [39,9]. For the purposes of 
this article, let us consider while-programs with a single loop. Given a pre- and 
post-condition, assertions, and contracts for functions called, the problem is to 
find a loop invariant that proves the post-condition and assertions (assuming the 
program is correct). 


Invariant synthesis using the ICE learning model Given a program with 
a single loop whose loop invariant we want to synthesize, there are many inductive 
invariants that prove the assertions in the program correct— these invariants are 
characterized by the following three properties: (a) that it include the states when 
the loop is entered the first time, (b) that it exclude the states that immediately 
exit the loop and reach the end of the program and not satisfy the post-condition, 
and (c) that it is inductive (i.e., from any state satisfying the invariant, if we 
execute the loop body once, the resulting state is also in the invariant). The 
teacher knows these properties, and must reply to conjectured hypotheses of the 
learner using these properties. Violation of properties (a) and (b) are usually 
easy to check using a constraint solver, and will result in a positive and negative 
concrete configuration as a sample, respectively. However, when inductiveness 
fails, the obvious counterexample is a pair of configurations, ( x,y ), where x is in 
the hypothesis but y is not, and where the program state x evolves to the state 
■y across one execution of the loop body. 

The work by Garg et. al. [21] hence proposes what they call the ICE model (for 
implication counterexamples), where the learner learns from positive, negative, 
and implication counterexamples. The author’s claim is that without implication 



counterexamples, the teacher is stuck when presented a hypothesis that satisfies 
the properties of being an invariant save the inductiveness property. 

From the described components we build an ALF Aice = (C, P, 7 , S, «), 
where C is the set of all subsets of program configurations, the hypothesis space 
TL is the language used to describe the invariant, and the sample space is defined 
as follows: 

— A sample is of the form S = ( P,N,I ), where P,N are sets of program 
configurations (interpreted as positive and negative examples), and / is a set 
of pairs of program configurations (interpreted as implications). 

— A set C £ C of program configurations is consistent with (P, N, I) if P C C, 

N fl C = 0, and if (c, d) £ / and c £ C, then also d € C. 

— The order on samples is defined by component-wise set inclusion (i.e., (P, N, I) C s 
(P', N', P) if P C P', N C N', and I C P). 

— The join is the component-wise union, and _L S = (0, 0,0). 

Since this sample space contains implications in addition to the standard positive 
and negative examples, we refer to it as an ICE sample space. 

We can now show that there is a teacher for these ALF instances, because a 
teacher can refute any hypothesis made by a learner with a positive, negative, 
or implication counterexample, depending on which property of invariants is 
violated. 

Proposition 1. There is a teacher for ALF instances of the form (AicE,Ti nv ). 

Furthermore, we can show that having only positive and negative samples 
precludes the existence of teachers. In fact, we can show that if C = 2 D (for 
a domain D) and the sample space S consists of only positive and negative 
examples in D , then a target set T has a teacher only if it is defined in terms of 
excluding a set B and including a set G. 

Lemma 4. Let C = H = 2 D , 7 = id, S = {(P, N) | P, N C D}, and k((P, N)) = 
{PC D | P C PAPn N = 0}. 

Let T C C be a target. If there exists a teacher for T, then there must exists 
sets B, G C D such that T ={PCP|PnP = 0 and G C P}. 

Proof. Assume that there exists a teacher for the target set T, and let G and 
B be the union of all positive examples and the union of all negative examples, 
respectively, that the teacher returns. Now, we claim that T = {P C D \ PflP = 

0 and PC G}. Towards a contradiction, assume that this is not the case. Then, 
there exists an PcT such that Br\R^$OTG%R. then there is 

some b £ R that was returned as a negative counterexample for some hypothesis. 
Since P £ T is not consistent with this negative example &, this contradicts the 
requirement that the teacher is honest. Similarly, if G % P, then there is some 
g £ G that was returned as a positive counterexample, which contradicts the 
teacher’s honesty. 



The above proves that positive and negative samples are not sufficient for 
learning invariants, as invariants cannot be defined as all sets that exclude a set 
of states and include a set of states. 

There are several ICE-based invariant synthesis formalisms that we can 
capture. First, Garg et. al. [21] have considered arithmetic invariants over a 
set of integer variables X\,X£ of the form V"=i Aj^i(^Cfe=i a k Jx k < c*’- 7 ), 
a'ff £ {—1,0,1}, where the learner is implemented using a constraint solver that 
finds smallest invariants that fit the sample. This is accurately modeled in our 
framework, as in the ICE formulation above, with the hypotheses space being 
the set of all formulas of this form. The fact that Garg et. al.’s learner produces 
smallest invariants makes it an Occam learner in the sense of Section 3 and, hence, 
it converges in finite time. The approach proposed by Sharma and Aiken [49], 
C2I, is also an ICE-learner, except that the learner uses stochastic search based 
on a Metropolis Hastings MCMC (Markov chain Monte Carlo) algorithm, which 
again can be seen as an ALF. 

We can also see the work by Garg et. al. [20] on synthesizing quantified 
invariants for linear data structures such as lists and arrays as ALFs. This 
framework can infer quantified invariants of the form 

Vyi, j/2: (yi < 2/2 <i)=> a[yi] < a[y 2 ]■ 

However, Garg et. al. do not represent sets of configurations by means of logical 
formulas (as shown above) but use an automata-theoretic approach, where a 
special class of automata, called quantified data automata (QDAs), represent such 
logical invariants; hence, these QDAs form the hypothesis space in the ALF. The 
sample space there is also unusual: a sample (modeling a program configuration 
consisting of arrays or lists) is a set of valuation words, where each such word 
encodes the information about the array (or list) for specially quantified pointer 
variables pointing into the heap, and where data-formulas state conditions of the 
keys stored at these locations. 

ALFs can also capture the ICE-framework described by Neider [40], where in¬ 
variants are learned in the context of regular model-checking [12]. In regular model 
checking, program configurations are captured using (finite—but unbounded—or 
even infinite) words, and sets of configurations are captured using finite automata. 
Consequently, the hypothesis space is the set of all DFAs (over an a priori chosen, 
fixed alphabet), and the sample space is an ICE-sample consisting of configura¬ 
tions modeled as words. The learner proposed by Neider constructs consistent 
DFAs of minimal size and, hence, is an Occam learner that converges in finite 
time (cf. Section 3). 

We now turn to two other invariant-generation frameworks that skirt the ICE 
model. 

Houdini The Houdini algorithm [18] is a learning algorithm for synthesizing 
invariants that avoids learning from ICE samples. Given a finite set of predicates 



P, Houdini learns an invariant that is expressible as a conjunction of some subset 
of predicates (note that the hypothesis space is finite but exponential in the 
number of predicates). Houdini learns an invariant in time polynomial in the 
number of predicates (and in linear number rounds) and is implemented in the 
Boogie program verifier [ 8 ]. It is widely used (for example, used in verifying device 
drivers [35,36] and in race-detection in GPU kernels [10]. 

The setup here can be modeled as an ALF: we take the concept space C to be 
all subsets of program configurations, and the hypothesis space Ti to be the set 
of all conjunctions of subsets of predicates in P, with the map 7 mapping each 
conjunctive formula in P to the set of all configurations that satisfy the predicates 
mentioned in the conjunction. We take the sample space to be the ICE sample 
space, where each sample is a valuation v over P (indicating which predicates 
are satisfied) and where implication counterexamples are pairs of valuations. 

The Houdini learning algorithm itself is the classical conjunctive learning 
algorithm for positive and negative samples (see [30]), but its mechanics are such 
that it works for ICE samples as well. More precisely, the Houdini algorithm 
always creates the semantically smallest formula that satisfies the sample (it 
hence starts with a conjunction of all predicates, and in each round “knocks 
off” predicates that are violated by positive samples returned). Since Houdini 
always returns the semantically-smallest conjunction of predicates, it will never 
receive a negative counterexample (assuming the program is correct and has a 
conjunctive invariant over P). Furthermore, for an implication counterexample 
(v,v'), the algorithm knows that since it proposed the semantically smallest 
conjunction of predicates, v cannot be made negative; hence it treats v' as a 
positive counterexample. Houdini converges since the hypothesis space is finite 
(matching the first recipe for convergence we outlined in Section 3); in fact, it 
converges in linear number of rounds since in each round at least one predicate 
is removed from the hypothesized invariant. 


Learning invariants in regular model-checking using witnesses The 

learning-to-verify project reported in [56,55,57,58] leverages machine learning to 
the verification of infinite state systems that result from processes with FIFO 
queues, but skirts the ICE model using a different idea. The key idea is to view the 
the identification of the reachable states of such a system as a machine learning 
problem instead of computing this set iteratively (which, in general, requires 
techniques such as acceleration or widening). In particular, we consider the work of 
Vardhan et. al. [56] and show that this is an instantiation of our abstract learning 
framework. The key idea in this work is to represent configurations as traces 
through the system and to add a notion of witness to this description, resulting 
in so-called annotated traces. The teacher, when receiving a set of annotated 
traces, can actually check whether the configurations are reachable based on the 
witnesses (a witness can be, say, the length of the execution or the execution itself) 
and, consequently, the model allows learning from such traces directly. Indeed, 
this approach can be modeled by an ALF: the concept space consists of subsets 
of configurations, the target space consists of the set of reachable configurations, 



the hypothesis space consists of automata over annotated traces, and the sample 
space consists of positive and negatively labeled annotated traces. 


4.2 Synthesis of Fixpoints in Abstract Domains 

In Section 4.1 we have explained how several instances of ICE learners fit into 
our framework. We now provide a generic technique to model the problem of 
hxpoint computation in the setting of abstract interpretation using learning. 

We assume the setting of abstract interpretation [16] with 

— a concrete domain (D, C, _L, U) and an abstract domain (2?, C, _L, U), which 
both have a join lattice structure, and 

— a Galois connection between these two, given by two monotone functions 
7 : 2? —>■ 2? (the concretization function), and a : V —> V (the abstraction 
function), with a{X ) C X <t=> X C 7 (A) for all X £ V and X G V. 

The concrete domain usually describes the semantics of a program and comes 
with an increasing transformer F : 2? —> 2? that captures the behaviour of the 
program (increasing means that X C X for each X). The abstract domain is 
used to model the aspects of the program that one is interested in. 

A specification is given in terms a set B C V of bad concrete elements. The 
goal is to find a fixpoint of F that is not above any bad element, i.e., an element 
X G V with F{X) = X, and Y % X for each Y £ B. We refer to such fixpoints 
as adequate fixpoints because they show that the program cannot reach a bad 
state. 

Synthesis of Precise Abstract Transformers The general idea of abstract inter¬ 
pretation is to do the fixpoint computation on the abstract domain instead of 
the concrete domain. This is done using an abstract transformer F : 2? —» T> that 
overapproximates the concrete transformer, in the sense that ot(F( 7 (A))) C F( A) 
for each A G V. The best abstract transformer is the one where equality holds 
instead of inclusion. 

Since manually designing a good abstract transformer can be difficult, Thakur 
et al. [53] propose an automatic method to synthesize the abstract post of a 
given abstract element for the concrete domain consisting of sets of program 
configurations V = 2 D . 

We can model the algorithm proposed by Thakur et al. [53] as a synthesis 
using learning in an ALF that uses the concrete domain V as concept space, the 
abstract domain 2 ? as hypothesis space together with the existing concretization 
function 7 as the concretization function for the ALF. Since we are interested 
in elements above a(F( 7 (A))), the sample space simply consists of positive 
examples, that is, a sample is a finite set of elements of D. A hypothesis is 
consistent with a sample if it contains all the elements from the sample. 

For an abstract element A, the target specification is given by T = {T C D \ 
a(F(j(X))) O T}. 



If the learner proposes some hypothesis H (which is an abstract element), 
then the teacher can check whether 7 (H) is a superset of F(-y(X)) and if not, 
return a positive example from F( 7 (A)) \ 7 (If). This is precisely what happens 
(at an abstract level) in the procedure proposed in [53]. 

One should note here that in case the abstract domain has a maximal element, 
a learner could always propose this element because it can be sure that it is a 
target element if the target is realizable. The idea is to come up with a learner 
that computes a better solution than just the maximum. In [53] this is done 
by starting with the empty hypothesis, and then for each positive example S 
returned by the teacher, applying the abstraction function a on it and taking the 
join of the resulting abstract element with the current hypothesis. In this way, 
the learner ensures that the hypothesis is always below the set a(.F( 7 (A))). So if 
the learner converges, then it computes the best abstract transformer on X. 

Synthesis of Adequate Fixpoints. We now describe a generic way for designing 
an ALF that models the problem of finding an adequate fixpoint as a synthesis 
problem that uses learning from samples. This can be seen as a generalization 
of finding loop invariants in an abstract domain (but without computing the 
abstract transformers). 

The ALF we propose has the concrete domain V as concept space, the abstract 
domain 2 ? as hypothesis space together with the existing concretization function 
7 . The sample space of an ALF is, in general, designed for a specific class of 
target specifications (in order to guarantee the existence of a teacher). In the 
following, we describe how to construct such a sample space for adequate fixpoints 
as targets. 

The invariants defined in Section 4.1 can be seen as adequate fixpoints for the 
transformer F and the set B being defined pointwise on single configurations (with 
concrete domain consisting of sets of program configurations). This pointwise 
definition is the reason why the sample space can be build up by using single 
configurations as positive and negative examples, and pairs of configurations as 
implications. 

We replace the pointwise definition by a definition of in terms of a class 
7Z C V of representative concrete elements. So the definition below is satisfied 
by the class of singleton sets in case of the concrete domain consisting of sets of 
program configurations. It intuitively states that 1Z is rich enough to prove that 
a hypothesis is not a fixpoint. 

Definition 10. A set 1Z CD is a representative set (for the transformer F) if 
for each X £ V and X := 7 (A) such that F(X) X, there are Y,Y' £ 1Z with 
Y C X, Y' C F(Y), and Y' % X. □ 

It is not hard to see that V is always representative set. And in fact that the 
set of all elements of the form 7 (H) and F('~f(H)), where H £ 7~L also forms a 
representative set. 

We consider target specifications of adequate fixpoints that can be expressed in 
terms of 1Z. We say that T C V is an ^-specification of adequate fixpoints if there 
is a set B C 7Z such that T = {X £ V \ F(X) = X and Y % X for all Y £ Bj. 



For this class of target specifications, we let the sample space Su consist of ICE 
samples over 7Z , that is, of triples ( P , N, I) of finite set P,NC1Z , and a finite set 
I C 1Z x 7Z. A concept X G D is consistent with a sample (P,N,I)iiY Cl for 
all Y G P, Y % X for all Y £ N, and if 7 C X, then F'CI for all (Y, Y') G I. 
This consistency relation is denoted by k-jz. In analogy to Proposition 1 we can 
show that this sample space is expressive enough to guarantee the existence of a 
teacher. 

Proposition 2. There is a teacher for ALF instances of the form (A,T) with 
A = (T>, V, 7 , S-jii «k) and an 1Z-specification of adequate fixpoints T. 

This provides a generic way of setting up a learning scenario for abstract 
interpretation, and thus provides a powerful tool for understanding the require¬ 
ments for the application and development of machine learning algorithms for 
the synthesis of adequate fixpoints. 


4.3 Program Synthesis 

In this section, we study several examples of learning-based program synthesis, 
which include synthesizing program expressions, expressions to be plugged in 
program sketches, snippets of programs, etc., and show how they can be modeled 
as ALFs. 


End-user synthesis from examples: Flashfill One application of synthesis 
is to use it to help end-users to program using examples. A prime example of 
this is Flashfill by Gulwani et al [23], where the authors show how string 
manipulation macros from user-given input-output examples can be synthesized 
in the context of Microsoft Excel spreadsheets. Flashfill can be seen as an ALF: 
the concept space consists of all functions from strings to strings, the hypothesis 
space consists of all string manipulation macros, and the sample space consists 
of a sets of input-output examples for such functions. The consistency relation k 
maps each sample to all functions that agree with the sample. The role of the 
teacher is played by the user: the user has some function in mind and gives new 
input-output examples whenever the learner returns a hypothesis that she is not 
satisfied with. The learning algorithm here is based on version-space algebras 
(which, intuitively, compactly represents all possible macros with limited size 
that are consistent with the sample) and in each round proposes a simple macro 
from this collection. 


Completing sketches and the SyGuS solvers The sketch-based synthesis 
approach [50] is another prominent synthesis application, where programmers 
write partial programs with holes and a system automatically synthesizes ex¬ 
pressions or programs for these holes so that a specification (expressed using 
input-output pairs or logical assertions) is satisfied. The key idea here is that given 
a sketch with a specification, we need expressions for the holes such that for every 



possible input, the specification holds. This roughly has the form 3 e.\/xip(e,x), 
where e are the expressions to synthesize and x are the inputs to the program. 

The Sketch system works by (a) unfolding loops a finite number of times, hence, 
bounding the length of executions, and (b) encoding the choice of expressions e 
to be synthesized using bits (typically using templates and representing integers 
by a small number of bits). For the synthesis step, the Sketch system implements 
a CEGIS (counterexample guided synthesis) technique using SAT solving, whose 
underlying idea is to learn the expressions from examples using only a SAT 
solver. The CEGIS technique works in rounds: the learner proposes hypothesis 
expressions and the teacher checks whether \/xip(e,x) holds (using SAT queries) 
and if not, returns a valuation for a: as a counterexample. Subsequently, the 
learner asks, again using a SAT query, whether there exists a valuation for the 
bits encoding the expressions such that ip(e,x) holds for every valuation of x 
returned by the teacher thus far; the resulting expressions are the hypotheses for 
the next round. Note that the use of samples avoids quantifier alternation both 
in the teacher and the learner. 

The above system can be modeled as an ALF. The concept space consists of 
tuples of functions modeling the various expressions to synthesize, the hypothesis 
space is the set of expressions (or their bit encodings), the map 7 gives meaning 
to these expressions (or encodings), and the sample space can be seen as the 
set of grounded formulae of the form ip(e, v) where the variables x have been 
substituted with a concrete valuation. The relation k maps such a sample to 
the set of all expressions f such that the formulas in the sample all evaluate to 
true if / is substituted for e. The Sketch learner can be seen as a learner in this 
ALF framework that uses calls to a SAT solver to find hypothesis expressions 
consistent with the sample. Since expressions are encoded by a finite number of 
bits, the hypothesis space is finite, and the Sketch learner converges in finite time 
(cf. Section 3). 

The SyGuS format [2] is a competition format for synthesis, and extends the 
Sketch-based formalism above to SMT theories, with an emphasis on syntactic 
restrictions for expressions. More precisely, SyGuS specifications are parameterized 
over a background theory T, and an instance is a pair ( G,ip(f )) where G is a 
grammar that imposes syntactic restrictions for functions (or expressions) written 
using symbols of the background theory, and ip is a formula, again in the theory 
T, including function symbols /; the functions f are typed according to domains 
of T. The goal is to find functions g for the symbols f in the syntax G such that 
ip holds. The competition version further restricts ip to be of the form \/xip'(f, x) 
where ip' is a quantifier-free formula in a decidable SMT theory— this way, given 
a hypothesis for the functions /, the problem of checking whether the functions 
meet the specification is decidable. 

There have been several solvers developed for SyGuS (cf. the first SyGuS 
competition [2,1]), and all of them are in fact learning-based (i.e., CEGIS) tech¬ 
niques. In particular, three solvers have been proposed: an enumerative solver, 
a constraint-based solver, and a stochastic solver. All these solvers can be seen 
as ALF instances: the concept space consists of all possible tuples of functions 



over the appropriate domains and the hypothesis space is the set of all functions 
allowed by the syntax of the problem (with the natural 7 relation giving its 
semantics). All three solvers work by generating a tuple of functions such that 
Vc vip'(f, x) holds for all valuations of x given by the teacher thus far. The enumer- 
ative solver enumerates functions until it reaches such a function, the stochastic 
solver searches the space of functions randomly using a measure that depends on 
how many samples are satisfied till it finds one that satisfies the samples, and the 
constraint-based solver queries a constraint-solver for instantiations of template 
functions so that the specification is satisfied on the sample valuations. Both the 
enumerative and the constraint-solver are Occam learners and, hence, converge 
in finite time. 

Note that the learners know ip in this scenario. However, we can model SyGuS 
as ALFs by taking the sample space to be grounded formulas ip'(f,v) consisting 
of the specification with particular values v substituted for x. The learners can 
now be seen as learning from these samples, without knowledge of ip (similar to 
the modeling of Sketch above). 

We would like to emphasize that this embedding of SyGuS as an ALF clearly 
showcases the difference between different synthesis approaches (as mentioned in 
the introduction). For example, invariant generation can be done using learning 
either by means of ICE samples (see Section 4.1) or modeled as a SyGuS problem. 
However, it turns out that the sample spaces (and, hence, the learners) in the 
two approaches are very different ! In ICE-based learning, samples are only 
single configurations (labeled positive or negative) or pairs of configurations, 
while in a SyGuS encoding, the samples are grounded formulas that encode the 
entire program body, including instantiations of universally quantified variables 
intermediate states in the execution of the loop. 


Machine-learning based approaches to synthesis One can implement the 
passive machine-learning algorithm to synthesize hypotheses from samples, in 
order to build a synthesis engine (along with an appropriate teacher that can 
furnish such samples). Recent work by Garg et. al. [22] proposes an algorithm for 
synthesizing invariants in the ICE-framework using machine learning classifiers 
(decision trees) that can be viewed as an ALF. 


Synthesizing guarded affine functions Recent work [47] explores the synthe¬ 
sis of guarded affine functions from a sample space that consists of information of 
the form /(s) = t, where s and t are integers. The learner here uses a combination 
of computational geometry techniques and decision tree learning, and can also be 
modeled as an ALF. Notice that this sample space precisely matches the sample 
space for deobfuscation problems (where the teacher can return counterexamples 
of this form - see Example at the end of Section 2 on page 8 - using the program 
being deobfuscated). Consequently, the learner in Alchemist [47] can be used for 
deobfuscating programs that compute guarded affine functions from tuples of 
integer inputs to integers (like the “multiply by 45” example in [27]). 



Other synthesis engines There are several algorithms that are self-described 
as CEGIS frameworks, and, hence, can be modeled using ALFs. For example, 
synthesizing loop-free programs [24], synthesizing synchronizing code for concur¬ 
rent programs [14] (in this work, the sample space consists of abstract concurrent 
partially-ordered traces), work on using synthesis to mine specifications [28], 
synthesizing bit-manipulating programs and deobfuscating programs [27] (here, 
the use of separate I/O-oracle can be modeled as the teacher returning the output 
of the program together with a counterexample input), superoptimization [48], 
deductive program repair [32], synthesis of recursive functional programs over 
unbounded domains [33], as well as synthesis of protocols using enumerative 
CEGIS techniques [54]. 

5 Variations and Limitations of the Framework 

In this section we discuss some variations and limitations of our framework. We 
start by briefly discussing a variation of our framework that omits the concept 
space. 

5.1 Omitting the Concept Space 

We believe that, for a clean modeling of a synthesis problem, one should specify the 
concept space C. This makes it possible to compare different synthesis approaches 
that work with different representations of hypotheses and maybe different types 
of samples over the same underlying concept space. 

However, for the actual learning process, the concept space itself is not of 
great importance because the learner proposes elements from the hypothesis 
space, and the teacher returns an element from the sample space. The concept 
space only serves as a semantic space that gives meaning to hypotheses (via the 
concretization function 7 ), and to the samples (via the consistency relation n). 

Therefore, it is possible to omit the concept space from an ALF, and to directly 
specify the consistency of samples with hypotheses. Such a reduced ALF would 
then be of the form A = (TL, 5, n) with a function k : S —>■ 2 n . In the original 
framework, this corresponds to the function k-h defined by k-h(S) = 7 ~ 1 k(S). 

To create ALF instances, the target specification is also directly given as a 
subset of the hypothesis space T C T~L. All the other definitions can be adapted 
directly to this framework. 

5.2 Limitations 

The ALF framework we develop in this paper is not meant to capture every 
existing method that uses learning from samples. There are several synthesis 
techniques that use grey-box techniques (a combination of black-box learning 
from samples and by utilizing the specification of the target directly in some 
way) or use query models (where they query the teacher for various aspects of 
the target set). 



For instance, there are active iterative learning scenarios in which the learner 
can ask other types of questions to the teacher than just proposing hypotheses 
that are then accepted or refuted by the teacher. One prominent scenario of 
this kind is Angluin’s active learning of DFAs [5], where the learner can ask 
membership queries and equivalence queries. (The equivalence queries correspond 
to proposing a hypothesis, as in our framework, which is then refuted with a 
counterexample if it is not correct.) Such learning scenarios for synthesis are used, 
for example, in [3] for the synthesis of interface specifications for Java classes, and 
in [43] for automatically synthesizing assumptions for assume-guarantee reasoning. 
Our framework does not have a mechanism for directly modeling such queries. 
The ALF framework that we have presented is intentionally a simpler framework 
by design that captures and cleanly models emerging synthesis procedures in the 
literature where the learner only proposes hypotheses and learns from samples 
the teacher provides in terms of samples to show that the hypothesis is wrong. 
The learner in our framework, being a completely passive learner (as opposed 
to an active learner), can also be implemented by the variety of scalable passive 
machine-learning algorithms in vogue [38]. A clean extension of ALFs to query 
settings and grey-box settings would be an interesting future direction to pursue. 


6 Conclusions 

We have presented an abstract learning framework for synthesis that encompasses 
several existing techniques that use learning or counter-example guided inductive 
synthesis to create objects that satisfy a specification. We were motivated by 
abstract interpretation [16] and how it gives a general framework and notation 
for verification; our formalism is an attempt at such a generalization for learning- 
based synthesis. The conditions we have proposed that the abstract concept 
spaces, hypotheses spaces, and sample spaces need to satisfy to define a learning- 
based synthesis domain seem to be cogent and general in forming a vocabulary 
for such approaches. We have also addressed various strategies for convergent 
synthesis that generalizes and extends existing techniques (again, in a similar 
vein as to how widening and narrowing in abstract interpretation give recipes 
for building convergent algorithms to compute fixed-points). We believe that the 
notation and general theorems herein will bring more clarity, understanding, and 
reuse of learners in synthesis algorithms. 
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